fix: reduce driver/base import overhead#1615
Conversation
ArnavBalyan
left a comment
There was a problem hiding this comment.
Lgtm thanks for the change, some minor comments. The perf improvement looks great!
| # under the License. | ||
|
|
||
|
|
||
| def test_driver_base_public_imports(): |
There was a problem hiding this comment.
The test does not cover future regressions, should we add an assert to ensure new code cannot eagerly load the dependency.
def test_driver_import_does_not_load_heavy_modules():
import sys
for mod in ("pandas", "numpy"):
sys.modules.pop(mod, None)
import hamilton.driver
for mod in ("pandas", "numpy"):
assert mod not in sys.modules, f"{mod} was loaded eagerly"
Along the above lines
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _get_pandas(): |
There was a problem hiding this comment.
There are many _get_dependency functions hard to track/standardize in the future, can we put them behind a shared helper.
Along the following lines:
@cache
def _lazy_import(module: str, attr: str | None = None):
"""Lazy import a module. Cached after 1st call."""
mod = importlib.import_module(module)
return mod if attr is None else getattr(mod, attr)
pd = _lazy_import("pandas")
np = _lazy_import("numpy")
pde = _lazy_import("pandas.core.indexes.extension")
c61fe33 to
165d06e
Compare
|
Thanks for the review! I addressed both comments and resolved the latest conflict with
Validation run:
I also tried the relevant upstream data-quality tests locally, but this environment could not collect them cleanly because extension loading hit the local |
Closes #1246.
This PR reduces import-time overhead for
from hamilton import driver, baseby deferring import-time work that is not needed during the initialdriver/baseimport path.Changes
from hamilton import driver, baseusingpython -X importtimeandcProfile.driver/baseimport path.from hamilton import driver, basefrom hamilton.driver import Driverfrom hamilton import baseHow I tested this
Import smoke test: